Fast and Efficient Log File Compression

نویسندگان

  • Przemyslaw Skibinski
  • Jakub Swacha
چکیده

Contemporary information systems are replete with log files, created in multiple places (e.g., network servers, database management systems, user monitoring applications, system services and utilities) for multiple purposes (e.g., maintenance, security issues, traffic analysis, legal requirements, software debugging, customer management, user interface usability studies). Log files in complex systems may quickly grow to huge sizes. Often, they must be kept for long periods of time. For reasons of convenience and storage economy, log files should be compressed. However, most of the available log file compression tools use general-purpose algorithms (e.g., Deflate) which do not take advantage of redundancy specific for log files. In this paper a specialized log file compression scheme is described in five variants, differing in complexity and attained compression ratios. The proposed scheme introduces a log file transform whose output is much better compressible with general-purpose algorithms than original data. Using the fast Deflate algorithm, the transformed log files were, on average, 36.6% shorter than the original files compressed with gzip (employing the same algorithm). Using the slower PPMVC algorithm, the transformed files were 62% shorter than the original files compressed with gzip, and 41% shorter than the original files compressed with bzip2.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FELFCNCA: Fast & Efficient Log File Compression Using Non Linear Cellular Automata Classifier

Log Files are created for Traffic Analysis, Maintenance, Software debugging, customer management at multiple places like System Services, User Monitoring Applications, Network servers, database management systems which must be kept for long periods of time. These Log files may grow to huge sizes in this complex systems and environments. For storage and convenience log files must be compressed. ...

متن کامل

Efficient Mixed Mode Summary for Mobile Networks

Cellular networks monitoring and management tasks are based on huge amounts of continuously collected data from network elements and devices. Log files are used to store this data, but it might need to accumulate millions of lines in one day. The standard name of this log is in GPEH format which stands for General Performance Event Handling. This log is usually recorded in a binary format (bin)...

متن کامل

MEDICAL IMAGE COMPRESSION: A REVIEW

Within recent years the use of medical images for diagnosis purposes has become necessity. The limitation in transmission and storage space also growing size of medical images has necessitated the need for efficient method, then image Compression is required as an efficient way to reduces irrelevant and redundancy of the image data in order to be able to store or transmits data. It also reduces...

متن کامل

Write-Optimized B-Trees

Large writes are beneficial both on individual disks and on disk arrays, e.g., RAID-5. The presented design enables large writes of internal B-tree nodes and leaves. It supports both in-place updates and large append-only (“log-structured”) write operations within the same storage volume, within the same B-tree, and even at the same time. The essence of the proposal is to make page migration in...

متن کامل

Efficient Text Compression Using Special Character Replacement and Space Removal

In this paper, we have proposed a new concept of text compression/decompression algorithm using special character replacement technique. Moreover after the initial compression after replacement of special characters, we remove the spaces between the words in the intermediary compressed file in specific situations to get the final compressed text file. Experimental results show that the proposed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007